Fail-Stutter Fault Tolerance

نویسندگان

  • Remzi H. Arpaci-Dusseau
  • Andrea C. Arpaci-Dusseau
چکیده

Traditional fault models present system designers with two extremes: the Byzantine fault model, which is general and therefore difficult to apply, and the fail-stop fault model, which is easier to employ but does not accurately capture modern device behavior. To address this gap, we introduce the concept of fail-stutter fault tolerance, a realistic and yet tractable fault model that accounts for both absolute failure and a new range of performance failures common in modern components. Systems built under the fail-stutter model will likely perform well, be highly reliable and available, and be easier to manage when deployed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerance

Permanent faults : This type of failure is persistent: it continues to exist until the faulty component is repaired or replaced. Examples of this fault are disk head crashes, software bugs, and burnt-out power supplies. Any of these faults may be either a fail-silent failure (also known as a fail-stop) or a Byzantine failure. A fail-silent fault is one where the faulty unit stops functioning an...

متن کامل

Automated design of efficient fail-safe fault tolerance

Both the scale and the reach of computer systems and embedded devices have been constantly increasing over the last decade. As such computer systems become pervasive, our reliance on such systems increases, resulting in our expectation for such systems to continuously deliver services, even in the presence of faults, that is we expect the computer systems to be dependable. One way to ensure the...

متن کامل

A Fail-Safe CMOS Logic Gate

This paper reports a design technique to make Complex CMOS Gates fail-safe for a class of faults. Two classes of faults are denned. The failsafe design presented has limited fault-tolerance capability. Multiple faults are also covered.

متن کامل

A Conceptual Framework for System Fault Tolerance

A major problem in transitioning fault tolerance practices to the practitioner community is a lack of a common view of what fault tolerance is, and how it can help in the design of reliable computer systems. This document takes a step towards making fault tolerance more understandable by proposing a conceptual framework. The framework provides a consistent vocabulary for fault tolerance concept...

متن کامل

Tiered Fault Tolerance for Long-Term Integrity

Fault-tolerant services typically make assumptions about the type and maximum number of faults that they can tolerate while providing their correctness guarantees; when such a fault threshold is violated, correctness is lost. We revisit the notion of fault thresholds in the context of long-term archival storage. We observe that fault thresholds are inevitably violated in longterm services, maki...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001